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We  present  an  algorithm  for  solving  convex  programs  with  nonlinear  con¬ 
straints.  The  algorithm  works  in  the  primal  space  only  and  uses  a  predictor- 
corrector  strategy  to  follow  a  smooth  path  that  leads  to  an  optimal  solution. 
The  algorithm  simultaneously  iterates  towards  feasibility  and  optimality.  The 
matrices  occurring  in  the  algorithm  can  be  kept  sparse  if  the  nonlinear  func¬ 
tions  are  separable  or  depend  on  few  variables  only.  Some  promising  numerical 
examples  obtained  from  a  preliminary  implementation  are  included. 
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1.  INTRODUCTION 

Following  Karmarkar’s  proof  of  polynomial- time  complexity  of  an  interior-point 
method  for  solving  linear  programs  [10],  efficient  numerical  implementations  for  the 
solution  of  linear  programs  by  related  interior-point  methods  have  been  presented; 
e.g.  [14,  15,  18]  and  many  others.  The  applicability  of  such  interior-point  methods 
to  nonlinear  convex  programs — like  problem  (CP)  below — was  soon  recognized  by 
Sonnevend  in  [28],  and  a  detailed  complexity  analysis  of  these  applications  was  pre¬ 
sented  in  [5,  6,  16,  23,  24],  showing  that  for  certain  classes  of  nonlinear  constraints 
essentially  the  same  speed  of  convergence  can  be  expected  as  for  a  linear  program. 

The  conversion  of  these  theoretical  results  into  numerical  algorithms  has  been 
very  slow  so  far.  In  this  paper,  we  intend  to  provide  some  encouragement  (supported 
by  numerical  experiments)  that  interior-point  methods  are  in  practice — not  just  in 
theory — an  efficient  tool  for  solving  certain  classes  of  convex  (and  possibly  non- 
convex)  problems. 

'This  work  was  supported  by  a  research  grant  from  the  Deutsche  F'orschungsgemeinschaft,  and 
by  the  U.S.  National  Science  Foundation  Grant  DDM-8715153  and  the  Office  of  Naval  Research 
Grant  NOOOM-90-.J- 1242. 

*On  leave  from  Institnt  fur  Angewandte  Mathomatik,  University  of  Wurzburg,  8700  Wurzburg, 
(West)  Germany. 


1.1.  The  Problem 

The  problem  under  study  is  to  find 


min{/o(i)  |  x  6  P  },  (CP) 

and  a  corresponding  optimal  solution  x*,  where  the  feasible  domain1  P  is  given  by 

P  :=  {x  e  ]Rn  \  fi(x)  <  0,  1  <  i  <  m},  (1.1) 

and  the  /,  :  S  — *  M  (0  <  i  <  m)  are  continuous  convex  functions,  i.e.  /,  £  C(S). 
We  assume  that  the  functions  fi  are  defined  on  a  common  closed  set  S  D  P  and 
that  the  /,  are  twice  continuously  differentiable  in  the  interior  S°  of  S.  The  first 
and  second  derivatives  are  assumed  to  be  known  and  are  denoted  by  column  vectors 
and  square  matrices: 


gt(x)  =  V/,-(*),  H,(x)  =  V2/,(i). 

The  algorithm  is  designed  to  work  properly  even  if  P  is  empty  or  unbounded.  The 
only  assumption  is  that  we  are  given  a  starting  point  x°  in  S°.  The  point  x°  may 
be  infeasible  for  (CP). 

2.  A  SIMPLE  BARRIER  METHOD 

2.1.  Barrier  Methods  in  General 

The  principle  of  a  barrier  method  for  approximating  the  solution  of  problem  (CP) 
is  based  on  the  ideas  in  [1,  2]  and  can  be  outlined  as  follows. 

For  n  =  p0,  Mi  ?  M2,  •  •  •  (where  1  =  Mo  >  Mi  >  M2  •  •  •  and  Pk  — *  0),  find 


ar(M)  :=  arg min  fQ(x)  +  v4>(x), 

i.e.  minimize  the  true  objective  function  /o  perturbed  by  p<^>,  where  4>(x)  is  a  smooth 
convex  “barrier  function”  for  the  set  P  (tending  to  infinity  as  x  approaches  the 
boundary  of  P  and  finite  in  P°). 

We  will  choose  the  function  <p(x)  =  -  X^i  log(-/,(x)).  For  this  definition  of  </> 
it  is  straightforward  to  see  that  <f>  is  smooth  and  convex  if  the  /;  are  so. 

It  is  well  known  (see  e.g.  [1])  that  the  minimizers  x(n)  are  unique  if,  for  example, 
P  is  bounded,  <t>  is  strictly  convex  and  fo  is  convex.  Further,  for  any  m  >  0  the  barrier 
term  n<}>(x)  ensures  that  x(fi)  is  feasible  for  (CP).  Fiacco  and  McCormick  showed 
under  weak  assumptions  that  the  minimizers  x(fi)  converge  to  an  optimal  solution  of 
problem  (CP)  as  the  perturbation  fi<f>(x)  of  the  objective  function  is  “phased  out”, 
i.e.  as  n  — *  0. 

'Here  we  assume  a  simple  (but  general)  form  of  P.  It  is  straightforward  to  include  linear 
constraints  and  upper  and  lower  bounds  on  the  variables  in  an  efficient  way. 
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2.2.  What  Makes  Them  Work 


In  what  follows  we  try  to  motivate  rather  informally  what  makes  a  barrier  method 
work. 

1.  An  immediate  advantage  of  a  barrier  method  is  that  a  constrained  optimization 
problem  is  solved  via  smooth  unconstrained  problems.  This  idea  was  early 
recognized  in  [1,  2]. 

2.  A  second  important  point  is  that  under  certain  conditions  [6,  23]  all  subprob¬ 
lems  are  of  the  same  “difficulty”  or  the  same  structure;  that  is,  no  matter  how 
small  fik  is,  the  domain  of  convergence  of  Newton’s  method  for  finding  x(/j.k) 
is  always  a  “fixed  percentage”  of  the  previous  level  set  {x|  x  6  P,  fo(x)  < 
/o(a:(/Zfc_ i ))} ,  and  the  “percentage”  is  independent  of  the  data  of  the  prob¬ 
lem.  This  fact  was  discovered  only  recently,  and  it  depends  on  the  choice  of 
the  barrier  function  and  the  constraint  functions.  For  a  detailed  analysis  we 
refer  to  [7,  24], 

Note  that  the  minimizers  x(/x)  converge  to  an  optimal  solution  of  (CP),  which 
usually  lies  at  the  boundary  of  P.  Since  the  barrier  function  goes  to  infinity  as 
its  argument  approaches  the  boundary,  one  might  expect  that  the  subproblems 
become  increasingly  harder  to  solve  as  tends  to  zero.  The  above  statement 
is  rather  surprising  in  that  it  guarantees  that — from  the  point  of  view  of  the 
size  of  the  domain  of  convergence  of  Newton’s  method — all  subproblems  are 
equally  hard  to  solve  (at  least  in  the  absence  of  rounding  errors). 

3.  A  third  nice  property  is  that  (under  weak  conditions)  x(/t)  is  a  smooth  curve 
in  n  (see  [1]),  and  the  tangent  to  x(n)  at  n  =  Hk  (pointing  to  the  “next  point” 
x(/tjt+i))  is  easily  computable  (see  Section  4.1). 

4.  Finally  it  is  also  important  that  under  weak  conditions,  one  can  show  that  the 
“distance”  \fo(x(n))-  /o(x*)|  is  0(n),  making  the  size  of  /j  a  reliable  stopping 
criterion. 

Below  we  state  a  conceptual  interior-point  algorithm.  The  algorithm  is  guaran¬ 
teed  to  converge  globally  with  a  linear  rate  of  convergence  that  is  independent  of  the 
problem  data  if  the  second  derivatives  of  the  constraint  functions  satisfy  a  relative 
Lipschitz  condition  as  defined  in  [6],  or  if  the  logarithmic  barrier  function  satisfies  a 
so-called  self-concordance  relation  defined  in  [24].  These  conditions  guarantee  that 
point  2  above  holds  true.  The  complexity  analyses  based  on  these  conditions  (as 
well  as  the  conditions  themselves)  are  quite  involved.  Here  we  only  mention  that 
a  fairly  general  class  of  convex  constraints  (in  particular  linear  or  convex  quadratic 
functions)  satisfy  these  conditions.  Our  numerical  experiments  suggest  that  the  al¬ 
gorithm  also  works  for  larger  classes  of  problems  for  which  a  theoretical  proof  of 
convergence  has  not  been  shown  yet.  In  this  paper  we  do  not  assume  either  of  the 
two  conditions  given  in  [7]  or  [24],  and  will  not  give  any  complexity  results. 


2.3.  Outline  of  a  General  Barrier  Method 

A  general  outline  for  a  barrier  method  can  be  stated  as  follows. 

Assume  no  —  1  and  x(no)  are  given.  Set  k  =  0. 

Do  until  convergence 

•  Compute  the  tangent  x‘(nk)- 

•  Select  nk+\  <  nk- 

•  Determine  a  prediction  for  the  next  iterate  by 

x(nk+l)  :=  *(Wk)  +  (/**+!  -  nk)x'(nk)- 


•  Find  x(fik+ 1)  by  Newton’s  method  starting  from  x(nk+i)- 

•  Set  k  =  k  +  1. 

End 

In  this  outline  we  suppressed  a  number  of  details  that  we  mention  briefly,  post¬ 
poning  their  detailed  discussion  to  Section  3. 

1.  The  points  x(/t)  need  not  be  computed  exactly. 

2.  Newton’s  method  and  the  extrapolation  must  be  secured  by  a  linesearch  (since 
the  function  <t>  that  defines  x(^)  is  not  defined  outside  P). 

3.  We  must  find  an  initial  point  x(no)  minimizing  fo{x)  +  no4>(x)- 

2.4.  Note  on  Primal-Dual  Methods 

The  method  outlined  above  works  in  the  primal  space  only.  We  briefly  mention  the 
relationship  to  primal-dual  methods. 

It  is  straightforward  to  convert  the  KKT-conditions  of  a  convex  optimization 
problem  into  a  nonlinear  complementarity  problem  that  involves  primal  and  dual 
variables.  The  functions  defining  the  nonlinear  complementarity  problem  are  p  jiio- 
tone  (they  are  the  gradients  of  convex  functions)  and  interior-point  algorit*  ms  for 
solving  nonlinear  complementarity  problems  with  monotone  functions  have  been 
proposed  in  [12,  13].  Implementations  of  such  primal-dual  methods  proved  to  very 
effective  when  applied  to  linear  programs  [14,  15],  and  it  may  be  expected  that  the 
same  also  holds  for  nonlinear  problems.  However,  the  search  dir.rtions  generated 
by  primal  and  primal-dual  methods  coincide  at  points  on  the  path  of  analytic  cen¬ 
ters,  and  the  worst-case  complexity  for  primal  and  primal-Jual  algorithms  is  the 
same.  Moreover,  the  strong  theoretical  results  proved  in  t.7,  24]  about  the  conver¬ 
gence  of  primal  methods  for  solving  convex  programs  have  not  been  proved  (yet) 
for  primal-dual  methods. 
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2.5.  Important  Details 

Our  method  has  much  in  common  with  the  traditional  barrier  methods  suggested 
by  [2,  1]  and  implemented  in  SUMT  in  1964.  It  is  natural  to  ask  why  these  methods 
did  not  retain  their  initial  popularity.  We  try  to  give  a  partial  answer  by  pointing 
out  some  new  developments. 

•  It  is  important  which  barrier  function  is  chosen.  For  large  classes  of  problems 
the  logarithmic  barrier  function  allows  rigorous  proofs  of  convergence  that 
cannot  be  given  for  other  barriers  like  l//,(x).  An  implementation  that  takes 
advantage  of  the  strong  theoretical  results— in  particular  those  shown  in  [24, 
7] — may  yield  a  robust  solver  for  some  classes  of  (almost)  convex  problems. 

•  As  pointed  out  in  [20,  30],  the  Hessians  of  the  barrier  functions  become  increas¬ 
ingly  ill-conditioned2  as  the  iterates  approach  an  optimal  solution  x*  if  there 
are  less  than  n  linearly  independent  constraints  active  at  x*.  This  difficulty 
also  occurs  when  solving  degenerate  linear  programs  by  interior-point  meth¬ 
ods.  The  large  number  of  numerical  experiments  carried  out  to  date  suggests, 
however,  that  with  a  careful  choice  of  algebra  for  solving  the  linear  equations 
this  difficulty  can  be  overcome.  We  may  hope  that  this  is  also  the  case  when 
solving  nonlinear  problems.  In  addition,  the  fact  that  computers  today  use  a 
much  higher  arithmetic  precision  than  25  years  ago  makes  current  codes  loss 
sensitive  to  ill-conditioning.  Finally,  as  pointed  out  by  [3],  a  concept  used  in 
early  barrier  methods  of  enforcing  equality  constaints  by  a  quadratic  penalty 
function  (rather  than  linearizing  them  at  each  step  and  using  projections) 
introduced  further  numerical  instability. 

•  Many  implementations  of  interior-point  methods  for  the  conceptually  simpler 
problem  of  solving  linear  programs  have  been  tested  in  the  recent  past.  These 
implementations  documented  the  great  importance  of  good  sparse-matrix  tech¬ 
niques.  Without  the  latter,  interior-point  methods  for  large  linear  programs 
are  completely  unattractive,  and  the  same  may  be  true  for  nonlinear  problems. 
It  may  be  anticipated,  however,  that  interior-point  methods  applied  to  certain 
classes  of  “inherently  sparse”  (e.g.  separable)  problems  with  cheap  first  and 
second  derivatives  will  be  able  to  exploit  the  additional  structure  and  yield 
fast  special-purpose  solvers. 

3.  A  MODIFIED  BARRIER  METHOD  FOR  (CP) 

3.1.  Shifted  Constraints  /,(x,/i) 

A  given  initial  point  x°  G  S°  might  not  be  feasible  for  (CP).  In  order  to  define 
a  barrier  function  for  x°  we  “enlarge”  the  feasible  set  /'  by  subtracting  certain 

2 This  can  also  be  seen  from  the  inner  ami  outer  ellipsoidal  approximations  of  the  level  sets  by 
ellipsoids  id  veil  by  the  Hessian  of  the  barrier  function  and  centered  al  the  points  /(/i)  If  there  an 
less  than  n  active  constraints,  the  level  sets  become  "flat’"  and  t he  ellipsoids  approximating  them 
hence  become  singular  in  the  limit;  see  [7], 
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nonnegative  quantities  j3,  from  fi  such  that  f,(x°)  -  (3i  <  0,  i.e.  such  that  z°  is  in 
the  “enlarged”  feasible  domain  {x  |  f,(x)  -  0,  <  0}. 

More  precisely,  let  t  =  maxi<;<m{l,/;(x0)}  and  /3<  =  max{/,(i°)  +  f,0}  for 
1  <  i  <  m,  and  define3 

=  Mx)  ~  M-  (3-!) 

This  implies  that 

fi(x°,  1)  <  —t,  1  <  i  <  m. 

The  above  computation  of  is  not  invariant  under  multiplication  of  /,  with  a 
positive  constant4.  Without  loss  of  generality  we  therefore  assume  that  all  functions 
/,  satisfy 

IM*°)||2  +  4=11  H,(x°)\\f  =  1,  0  <  »  <  m,  (3.2) 

y/Tl 

where  the  Frobenius  norm  ||A||f  of  a  matrix  A  is  given  by  ||A||f  =  (£  A^)1/2  an(f 
is  easy  to  compute.  (To  arrange  this  we  simply  multiply  fi(x)  by  a  suitable  scalar 
before  we  start  the  algorithm.)  Note  that  Vr/,(x,/z)  =  g,(x). 

3.2.  Shifted  Sets  P ^ 

For  /t  6  [0, 1]  we  consider  feasible  domains  P M  defined  by 

P»  :=  {*1  fi(x,p)  <0,  1  <i<m}.  (3.3) 

Note  that  Pq  =  P  and  that  x°  is  in  the  interior  of  P\  (i.e.  x°  6  P°).  The  following 
lemma  relates  the  feasible  domains  P 4  to  P. 

Lemma  1 

1.  P  C  Pm  C  Pm  for  0  <  n\  <  M2  <  1- 

2.  P  =  n„>0Pm 

3.  If  P  is  not  empty,  the  interior  P°  is  not  empty  for  all  ^  G  (0, 1]. 

4.  If  P  is  empty,  there  is  a  6  in  [0, 1)  such  that  P^  is  empty  for  all  /z  €  [0,(>)  and 
the  interior  P°  is  not  empty  for  all  \i  6  (5,1). 

Proof:  See  Appendix  B.  I 

The  algorithm  below  follows  a  path  of  points  z(m)  G  from  /z  =  1  to  fi  =  0+.  In 
contrast  to  the  simple  outline  given  earlier,  the  feasible  sets  P ^  for  the  subproblems 

3The  approach  presented  here  may  need  to  be  modified  in  the  following  case.  A  certain  function 
/,  may  be  convex  (or  have  a  self-concordant  barrier  function)  in  the  domain  {x|/,(z)  <  0)  but  not 
in  ji|/,(x)  <  /?,}  if  j3,  >  0.  For  a  more  precise  statement  we  refer  to  Appendix  A. 

4  Nor  is  it  affine  invariant.  However  it  guarantees  that  the  initial  point  x°  is  at  least  t  >  1  away 
from  each  constraint  f%(x)-0,  <  0,  which  is  sufficient  for  our  purposes.  (Finite-precision  arithmetic 
is  not  affine  invariant  either.)  An  affine  invariant  computation  of  8  is  given  by  the  solution  of  the 
following  problem:  min { ||/311a  1  8  >  0,  /i(x°)  -  8<  <  0,  5Zsi(z°)/(/i(*°)  -  0,)  =  0). 
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of  finding  x(p)  do  not  remain  constant  in  our  method  below.  For  the  sets  P ^  we 
define  a  barrier  function  4>  of  the  free  variables  x  and  the  parameter  p  by5 

m 

4>{x,fl)  =  -  ^log(-/,(i,/x)). 

1=1 


3.3.  The  Perturbed  Center 

Before  stating  the  subproblems  that  define  the  points  x(p),  we  digress  briefly  for  a 
technical  but  convenient  detail. 

First  note  that  the  properties  of  the  barrier  function  <f>,  in  particular  its  second 
derivative  and  convexity,  finiteness  in  P and  the  limit  as  x  approaches  the  boundary 
of  are  invariant  under  linear  perturbations  of  4>. 

Hence,  the  set  {4>  |  <j>(x,p)  :=  <j>(x,p)  +  wTx,  w  £  lRn  fixed}  of  linear  perturba¬ 
tions  of  4>  forms  a  family  of  barrier  functions  for  P^.  Each  barrier  function  defines 
a  smooth  path  of  minimizers  argmin{/o(i)  +  p<j>(x,p)}  leading  from  some  point  in 
P°  to  an  optimal  solution  of  (CP).  Also,  for  any  >  0  and  any  x°  £  P°0  there  is  a 
unique  w  such  that  the  path  starts  at  x°  when  p  runs  from  p°  to  0.  Therefore,  the 
functions  <f>  define  a  vector  field  that  flows  to  the  optimal  set,  a  fact  that  will  be  used 
extensively  later  on  and  is  well  described  for  the  case  of  linear  constraints  in  [17]. 
The  minimizers  of  the  perturbed  barrier  functions  will  be  called  perturbed  centers. 
and  the  paths  of  the  vector  field  will  be  referred  to  as  perturbed  center  paths. 

The  “perturbed  center”  without  perturbation  (i.e.  with  w  =  0)  is  the  analytic 
center  defined  by  Sonnevend  in  [28].  It  exists  if,  for  example,  the  set  of  optimal 
solutions  is  bounded  and  the  Hessians  of  the  constraints  satisfy  a  relative  Lipschitz 
condition  [6].  For  the  analytic  center  a  number  of  nice  properties  can  be  shown;  in 
particular, 

•  If  the  shifts  (3,  are  zero,  one  can  show  that  there  is  a  two-sided  ellipsoidal 
approximation  of  the  level  set  {x  £  P  j  fo(x)  <  A(/t)}  centered  at  the  point 
x(/t),  such  that  the  ratio  of  the  inner  ellipsoid  to  the  outer  ellipsoid  is  of  order 
m  (the  number  of  constraints)  independent  of  the  data  of  the  problem  [29,  7]. 
Here,  A(p)  >  /0(x(p))  is  a  suitable  number  whose  derivation  is  explained  in 

[29]. 

•  Again  if  /?,  =  0,  the  numbers  /// f,(x(p))  define  dual  feasible  variables  that  can 
be  used  in  a  test  for  optimality. 

Both  properties  hold  in  a  somewhat  weaker  form  if  the  perturbation  w  is  small  [6]. 

5Thc  change  in  concept  is  not  substantial.  Indeed,  we  may  regard  as  a  function  of 

the  n  1  variables  x,p  In  the  x,p  space  the  domains  of  the  barrier  functions  remain  constant, 
(.dearly,  if  the  functions  /,(x)  are  linear  or  convex  quadratic,  then  so  are  the  functions  /,(x./i). 
and  the  properties  of  the  logarithmic  barrier  function  (the  relative  Lipschitz  condition  in  [7]  or  self¬ 
concordance  in  [-M])  also  hold  in  the  r.,fi  space.  For  other  convex  functions  this  is  not  always  true 
However,  a  complexity  analysis  based  on  these  properties  is  not  of  our  concern  for  the  moment  We 
hope  to  identify  larger  classes  of  problems  that  can  be  solved  by  our  method. 
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The  particular  perturbation  we  choose  is 


m  . 

w  :=  pg0{x°)  +  £  _7 /~0  r,Si{x°),  (3.4) 

where  p  is  a  scalar  to  be  specified  later.  (Both  w  and  p  are  fixed  throughout  the 
algorithm.)  For  our  barrier  method  we  consider  the  functions  p^  :  P°  —>  St, 

m 

<pAx)  •=  -fo(x)  -  ]£log {-fi{x,p))-  wTx,  (3.5) 

P  i=\ 

that  combine  a  multiple  of  the  objective  function  and  the  perturbed  barrier  function. 
For  p  £  (0, 1]  we  define6  x(p)  to  be  a  perturbed  center  if  x(p)  £  P°  and  if  it  is  a 
minimum  of  the  function  w  We  denote  the  gradient  and  Hessian  of  p^  by  g(x,p) 
and  H(x,p)  as  follows: 


g{x,p)  :=  Vy>M(x)  -  ^ g0(x 


(3.6) 


and 


+ 


fi{x,p)2 


(3.7) 


Clearly,  H{x,p)  is  positive  semidefinite  if  the  /;  are  convex.  Here,  we  assume  that  it 
is  positive  definite.  It  becomes  ill-conditioned,  however,  if  x  approaches  some  point 
on  the  boundary  of  P „  at  which  less  than  n  linearly  independent  constraints  are 
active.  In  Section  4.4  we  “expand”  the  matrix  H  and  refer  to  [4]  to  improve  the 
stability  when  solving  linear  systems  with  H .  Note  that  x  —  x(p)  is  a  perturbed 
center  if  and  only  if  it  is  a  zero  of  the  following  characteristic  equation: 


g{x,p)  =  0.  (3.8) 

(Clearly,  if  x  satisfies  g{ x,p)  =  0,  then  by  convexity  it  is  a  minimum  of  w  Con¬ 
versely,  if  i  is  a  minimum  of  in  the  open  set  P°,  then  g(x,p)  =  0.) 

Note  that  by  definition  of  w  the  point  x°  is  the  first  center:  x°  =  x(l).  Ideally 
we  would  like  the  perturbation  w  to  be  zero,  in  which  case  the  points  x(p)  are  the 
analytic  centers.  The  size  of  w  depends  on  the  choice  of  the  starting  point  and  on 
the  shifts  /?,.  If  w  is  close  to  zero,  one  can  prove  that  the  tangent  to  the  curve 
x(p)  closely  approximates  the  curve  in  some  interval  [p\,p2 ]  that  does  not  depend 
on  the  problem  data.  For  large  w  (measured  in  the  norm  given  by  //(x,/i)_1)  this 
is  no  longer  true.  It  is  therefore  important  to  initialize  the  method  such  that  iv  is 
moderate  in  size,  see  Appendix  A.2.  Eventually,  H(x(p),p)  —*  oo  as  p  — ^ >  0,  so  that 
id  measured  in  the  above  norm  tends  to  zero. 

The  function  pp^(x)  is  (at  least  for  p  -  1)  just  the  objective  function  /o(x)  to 
which  a  multiple  (p)  of  the  barrier  function  is  added,  as  in  the  outline  of  Section 

6If  P  is  empty  then  the  definition  is  valid  only  for  p  6  (6, 1],  where  6  is  as  in  Lemma  1.  We  note 
tli at  for  certain  degenerate  cases  the  analytic  center  may  not  exist,  while  the  perturbed  center  is 
well-defined. 
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2.1.  Our  choice  of  in  (3.5)  rather  than  the  seemingly  more  natural  choice 
can  be  motivated  by  its  gradient  (3.6)  defining  the  characteristic  equation  (3.8).  If 
we  set  p  =  1  for  the  moment,  then  (3.6)  shows  a  certain  symmetry  in  the  treatment 
of  go  and  g,  for  i  >  1:  The  parameter  /x  may  be  considered  as  a  function  value 
g  =  -fo(x,n)  :=  \(fi)  —  fo{x)  for  some  suitable  quantity  A(p).  For  quadratic  /,  it 
has  been  shown  in  [29]  that  there  is  a  monotone  1-1  correspondence  between  g  and 
A (g),  and  that  A (g)  converges  to  the  optimal  value  fo(x*).  (The  result  generalizes  in 
a  straightforward  way  to  convex  /,  with  self-concordant  barrier  functions.)  Another 
reason  why  we  consider  ip^  rather  than  is  that  the  property  of  self- concordance 

is  not  invariant  under  multiplication  of  the  barrier  function  by  g. 

Under  relatively  weak  assumptions  the  following  properties  can  be  shown. 

Lemma  2 

1.  A  unique  perturbed  center  exists  for  all  g  for  which  P°  is  not  empty. 

2.  limM_o  minx€p  ||x(/i)  -  x||2  =  0  if  P  is  not  empty. 

3.  lim^—o  fo(x(g))  —  /o(z*)  =  0  if  x*  is  an  optimal  solution  to  (CP). 

Proof:  See  Appendix  C.  I 

4.  ONE  ITERATION  OF  THE  METHOD 

The  general  idea  of  the  method  is  as  follows.  Starting  from  g  —  l  and  i(l)  =  a:0, 
a  sequence  of  iterates  is  generated  in  some  neighborhood  of  the  path  x(/x).  The 
iterates  xh  are  regarded  as  approximations  to  points  x(gk)  on  the  path  of  perturbed 
centers,  where  /x°  =  1,  gk  >  0,  gk  — »  0.  The  algorithm  proceeds  in  three  steps  per 
iteration. 

1.  Compute  the  tangent  x'  to  the  curve  x(fi)  at  the  current  iterate  xk  and  pk. 

2.  Choose  adaptively  a  steplength  a  6  (0,1)  to  follow  the  tangent  starting  from 

xk .  Let  the  resulting  point  be  xk  =  xk  -  apkx'.  Set  fik+l  -  fik(  1  -  «).  The 
steplength  o  is  chosen  such  that  xk  €  and  such  that  Newton  iterations 

starting  from  xk  for  finding  x(/.ik+})  can  be  expected  to  converge  rapidly. 

3.  Perform  a  small  number  of  Newton  steps  to  bring  the  iterate  closer  to  the 
path  of  perturbed  centers  (using  some  “old”  factorization  of  the  Hessian  and 
a  linesearch  with  merit  function  p^+i).  The  result  is  xk+1 . 

It  is  in  Step  3  where  our  method  differs  from  most  implementations  of  interior-point 
algorithms  for  linear  programming  [4,  14,  15].  For  linear  programs  it  appears  that 
the  extra  effort  taken  in  Step  3  to  move  away  from  the  boundary  towards  the  renter 
does  not  pay.  For  nonlinear  problems  our  results  indicate  that  "centering"  stabilizes 
the  algorithm  and  may  be  necessary  in  some  cases. 
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4.1.  The  Tangent 

Let  v(x ,/t)  be  the  derivative  of  g(x,g.)  with  respect  to  p: 

V(x,v)  :=  =  -^so(x)  -  (4.1) 

If  f/(x(/t),p)  is  positive  definite  the  perturbed  center  is  unique  and  the  tangent  to 
the  curve  of  perturbed  centers  at  x(fi)  is  defined  by  the  linear  system 

H(x(n),n)x\n)  =  -v(x(fi),ii).  (4.2) 

Verification  of  (4.2)  is  straightforward  by  differentiating  g(x(n),n)  =  0  in  (3.6)  with 
respect  to  //.  (It  can  be  shown  that  limM_o  l*2 H(x(n),p)  exists  and  is  nonzero.  In  our 
implementation  we  therefore  use  n2H(x,n)  and  p2u(x,p)  instead  of  the  unbounded 
quantities  H(x,n)  and  v(x,/x).) 

Note  that  w  does  not  occur  in  the  definition  of  the  tangent.  If  the  current 
iterate  is  some  point  xh  that  is  not  on  the  path  of  perturbed  centers,  then  the  above 
quantity  is  the  tangent  to  some  other  perturbed  center  curve  that  also  leads  to  an 
optimal  solution  x*. 

We  note  the  inherent  sparsity  of  H(x,g.)  if  the  functions  f,  each  depend  on  few 
variables  only,  or  if  there  is  a  small  number  of  separable  functions  /,.  (For  separable 
fi  the  gradient  g,  and  thus  also  gxgj  could  be  full.  The  “expansion”  of  H  suggested 
in  Section  4.4  that  is  intended  to  reduce  the  condition  number  of  the  linear  system 
also  preserves  the  sparsity  in  this  case.) 

In  Step  1  above  we  determine  x'  from  (4.2)  with  xk  in  place  of  x(/r)  and  /r  =  fik , 
i.e.  from  the  system  H(xk,fik)x'  =  -v(xk,nk).  The  steplength  a  in  Step  2  depends 
on  how  well  Newton’s  metnod  converges.  We  focus  on  the  Newton  step  first. 

4.2.  The  Newton  Step 

The  Newton  step  Ax  for  finding  x(/x)  starting  from  x  £  P^  is  given  by  the  system 

H{x,n)Ax  =  -5(x,p).  (4.3) 

From  [24,  6]  we  know  that  Newton’s  method  (without  linesearch)  for  finding  the 
center  x(/i)  converges  quadratically  if  ifn  >s  self-concordant  (i.e.  if  the  Hessians  of 
the  constraints  satisfy  a  relative  Lipschitz  condition)  and 

7  :=  AxTII(x,fi)Ax  =  g(x,n)TH(x1n)~1g(x,n)  =  -g(x,n)T Ax  <  (4.4) 

We  note  that  a  Newton  step  for  finding  x(pi+1)  may  not  be  necessary,  since  as 
mentioned  above,  all  the  “perturbed  center  curves”  end  in  the  optimal  set,  and  one 
could  continue  by  following  the  tangents  of  different  curves.  However,  the  step  along 
the  tangent  may  bring  the  point  xk  close  to  the  boundary  of  P^k  (and  iterating  too 
close  to  the  boundary  of  P^  slows  down  convergence),  so  that  a  Newton  correction 
is  indeed  useful.  In  our  program  we  compute  the  Newton  correction  with  the  same 
factorization  of  H  as  already  used  for  the  computation  of  x' .  The  linesearch  during 
the  Newton  step  is  controlled  by  the  merit  function  (p^k+  i(x). 


10 


4.3.  The  Steplength  a  During  Extrapolation 

Our  goal  is  to  choose  a  £  (0,1)  as  large  as  possible  such  that  the  extrapolation 
xk  =  xk  -  ankx'  is  strictly  feasible,  i.e.  xk  £  P°k+-.  with  /rfc+1  =  /ik(  1  -  a),  and  such 
that  Newton’s  method  for  finding  the  (next)  center  converges  rapidly. 

An  obvious  possibility  to  guarantee  the  second  condition  is  to  choose  the  step- 
length  during  extrapolation  small  enough  such  that  the  first  Newton  step  Ax  starting 
at  the  predicted  point  satisfies  a  relation  of  the  type  (4.4).  However,  this  results 
in  very  short  steps  a  in  general.  In  our  implementation  we  first  approximated  the 
maximum  possible  steplength  omax  such  that  x  +  am^xx'  is  still  feasible  and  then 
took  r  percent  of  amax.  If  it  turned  out  that  Newton  iteration  converged  quickly  we 
increased  r  for  the  next  extrapolation,  and  conversely,  if  Newton  iteration  was  slow 
we  decreased  r. 


4.4.  Improving  the  Stability  of  the  Linear  Systems 

At  the  definition  cf  H  in  (3.7)  we  mentioned  the  instability  to  be  expected  when 
solving  systems  with  II.  In  [4],  Gill  et  al.  present  an  approach  to  stabilize  the 
solution  of  KKT  systems.  To  illustrate  how  their  analysis  applies  to  our  matrix  II 
we  set  p  =  1  and  p  =  0  for  the  moment.  Note  that  II  can  be  written  in  the  form 

II(x,fi)  =  II(x,fi)  +  JT  D~2J.  (4.5) 

where  II{x,fi)  =  ~IIq(x)  +  YT=\  Hi(x),  J  is  the  Jacobian  of  the  constraints, 


J  = 


(  fft(-r) 


V  9th(%) 


(4.6) 


and  D  =  diag( /,(x )). 
with 


Soiving  a  system  with  II  is  equivalent  to  solving  a  system 


K  = 


-D2 

JT 


(4.7) 


which  can  be  seen  when  taking  the  Schur  complement  of  —  I)2  within  A’.  This 
system  in  turn  is  equivalent  to  a  system  involving 


/  D~2  0  /  \ 

h"  =  0  fl  JT 

\  I  J  0  / 


(4.X) 


(Take  the  Schur  complement  of  D~2.)  Systems  of  the  form  K'  are  considered  in 
[4],  The  basic  idea  is  that  it  is  better  to  factorize  A'  directly,  or  to  take  the  Schur 
complement  of  just  certain  parts  of  the  diagonal  matrices,  such  that  the  Schur  com¬ 
plement  does  not  become  excessively  ill-conditioned  (and  does  not  suffer  excessive 
fill-in  caused  by  dense  rows  or  columns  of ./). 


11 


4.5.  Stopping  test 


Let  c  be  the  desired  final  accuracy  in  the  objective  function.  A  possible  stopping 
criterion  is  p  <  ep(  1  -f  ||D/o(x)||)/m.  For  convex  constraints  with  self-concordant 
barrier  functions  and  a  linear  objective  function  this  stopping  test  is  exact  at  points 
on  the  path  of  analytic  centers  [7].  However,  since  the  constraints  have  been  shifted, 
the  final  iterate  is  not  always  feasible.  It  is  only  guaranteed  that  /,(x)  <  p/3,.  The 
relative  constraint  violation  is  bounded  by  p/3, 7(1  +  ||.D/,(z)||).  In  the  numerical 
experiments  below  we  have  chosen 

ft  :=  min(  ep(  1  +  \\D  f0(x°)\\)/n,  c(l  +  ||D/,(x°)||)/(/3,  +  c)  ) 
for  1  <  i  <  m  and  stopped  as  soon  as  p  <  ft. 


4.6.  Convergence 

Before  concluding  this  description  we  briefly  state  some  convergence  results. 

•  If  (CP)  has  an  optimal  solution,  one  can  show  under  weak  conditions  that  as 
/t  —  0  the  iterates  xk  satisfy  the  same  limit  relations  (for  k  — >  oo)  as  stated 
for  x (ft)  in  Lemma  2. 

•  If  (CP)  has  no  optimal  solution,  either  xk  — *  oo  (if  the  problem  is  unbounded) 
or  we  find  that  p  — *  b  >  0  with  b  as  in  Lemma  1  Part  4.  Both  cases  (xk  —  oc 
and  ft  —  b  >  0)  are  hard  to  identify  in  an  implementation  and  need  special 
attention. 

5.  NUMERICAL  EXPERIMENTS 

The  above  method  was  implemented  in  MATLAB™[19]  and  tested  on  a  few  prob¬ 
lems  with  up  to  300  unknowns  and  dense  arithmetic.  As  mentioned  before,  the  use 
of  sparsc-matrix  techniques  will  be  crucial  for  the  efficiency  of  this  method.  The  de¬ 
velopment  of  efficient  interior-point  methods  for  linear  programs  took  several  years 
and  similar  efforts  may  be  needed  for  developing  an  interior-point  method  for  non- 
linearly  constrained  problems.  The  goal  of  the  implementation  here  was  merely  to 
illustrate  the  behavior  of  the  method  in  terms  of  number  of  iterations  and  Newton 
corrections,  and  to  test  various  parameters  (such  as  /3  and  p)  that  define  the  barrier 
function. 

The  statistics  below  read  as  follows.  Each  iteration  involves  computation  of 
the  tangent  and  a  small  number  (1  to  10)  of  Newton  steps.  The  tangent  and  the 
Newton  steps  are  computed  from  a  linear  system  that  involves  the  Hessian  of 
Sometimes  more  than  one  Hessian  is  needed  in  an  iteration.  Each  Hessian  is  used 
for  several  Newton  steps;  their  computation  and  factorization  dominates  the  overall 
computation. 
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5.1.  Problem  Manne 


This  Problem  is  taken  from  [21],  where  it  is  presented  in  two  versions.  The  first 
version  involves  300  variables,  a  logarithmic  objective  function,  100  nonlinear  con¬ 
straints,  100  linear  inequalities  and  400  simple  bounds.  The  second  version  is  iden¬ 
tical  except  that  it  has  only  300  simple  bounds. 

In  [21]  results  are  given  for  MINOS.  Version  one  took  7  major  iterations,  183 
minor  iterations,  497  function  evaluations  and  12  seconds  on  an  IBM  370/168,  while 
version  two  required  11  major  iterations,  355  minor  iterations,  859  function  evalua¬ 
tions  and  34  seconds. 

MINOS  performs  best  if  a  high  number  of  linear  constraints  or  bounds  are  active 
in  the  optimal  solution,  thus  reducing  the  size  of  the  (dense)  systems  that  are  solved 
in  each  iteration.  For  version  one,  the  size  of  the  dense  systems  grew  to  25,  and  for 
version  two  they  grew  to  99  (since  some  of  the  active  bounds  in  version  one  wore 
removed). 

In  contrast  to  MINOS,  the  size  of  the  systems  to  be  solved  in  each  iteration  of 
the  interior-point  algorithm  is  always  300,  i.e.  the  number  of  variables.  For  both 
versions  of  the  problem  these  systems  are  sparse  and  of  diagonal  structure,  with  at 
most  7  nonzeros  per  row.  A  sparse-matrix  solver  could  take  great  advantage  of  the 
structure. 

Problem  Manne  is  sparse  and  convex,  but  it  does  not  satisfy  the  conditions  of 
[6]  or  [24]  that  guarantee  a  fixed  minimum  rate  of  convergence. 

Below  we  report  the  results  of  our  method  for  version  one  and  version  two.  As 
starting  point  we  chose  the  (infeasible)  vector  of  all  ones.  (The  objective  and  some 
of  the  constraints  are  not  defined  for  x  =  0.)  In  contrast  to  MINOS,  the  interior- 
point  algorithm  performed  slightly  better  for  the  second  problem,  giving  hope  that 
for  certain  problems  in  which  the  active  constraints  do  not  significantly  reduce  the 
dimension  of  the  MINOS  subproblems,  interior-point  algorithms  may  become  an 
attractive  alternative. 

Problem  Manne  was  one  for  which  our  initialization  in  Section  3.3  resulted  in 
a  vector  w  of  norm  (t n^//",ir)1/2  ~  15.  By  the  procedure  in  Appendix  A. 2  we 
decreased  the  norm  of  w  to  about  one  before  starting  t  he  iterations.  For  comparison 
we  also  list  the  results  for  problem  one  without  reducing  the  size  of  tr.  In  this  case 
convergence  was  very  slow,  and  for  many  iterations  the  maximum  steplength  ninax 
during  extrapolation  was  less  than  0.25.  In  all  examples,  Newton's  method  for 
finding  the  perturbed  center  at  each  iteration  was  terminated  when  the  //-norm 
||Az||//  :=  (Ax'r//(r,/i)Az)1/2  of  the  Newton  step  A.r  was  less  than  1/2. 

We  also  show  the  results  for  a  smaller  version  of  problem  one  with  only  30 
unknowns,  to  show  that  the  number  of  iterations  grows  only  moderately  with  the 
number  of  variables  for  this  particular  problem. 


Problem 

Large  w 

Version  one 

Version  two 

Small  problem 

Iterations 

38 

19 

15 

11 

Hessians 

40 

24 

20 

13 

Newton  steps 

115 

76 

61 

38 

Plloo 

2.05 

2.05 

2.05 

2.05 

p 

4.1 

4.1 

3.8 

0.29 

Final  /r 

3.4e-9 

5.7e-9 

2.2e-8 

8.7e-9 

Objective 

9.287556 

9.287556 

9.330183 

2.670098 

Constraint  violation 

4.9e-9 

8.1e-9 

3.1e-8 

l.le-8 

5.2.  Problem  385,  Schittkowski 

This  is  a  problem  with  15  unknowns,  10  convex  quadratic  constraints  and  a  linear 
objective  function.  It  is  taken  from  [27],  where  it  was  solved  with  NLPQL  [26]  using 
693  function  evaluations  and  242  gradient  evaluations.  Running  times  or  numbers 
of  arithmetic  operations  are  not  reported  in  [27].  The  starting  point  (zero)  was 
strictly  feasible  and  /3  was  zero.  (Hence,  P ^  was  constant  and  also  the  final  point 
was  strictly  feasible.)  Our  implementation  took  11  iterations  to  solve  the  problem, 
a  total  of  11  evaluations  of  the  Hessian,  27  Newton  steps  (each  of  which  requires 
the  evaluation  of  the  gradient  of  ip)  and  64  additional  gradient  evaluations  for  the 
linesearch  steps.  The  steplength  a  was  0.80  on  average,  ranging  from  0.70  to  0.92. 
The  Hessians  of  the  constraints  are  diagonal,  but  to  preserve  the  sparsity  of  the 
Hessians  of  p,  the  dense  outer  products  of  the  gradients  in  (3.7)  must  be  treated 
separately  (for  example  as  in  Section  4.4). 

5.3.  Problem  386,  Schittkowski 

This  is  the  same  as  problem  385  above  (except  that  two  entries  in  the  coefficients 
of  the  constraints  are  changed)  with  an  additional  concave  constraint.  In  [27],  900 
function  evaluations  and  327  gradient  evaluations  were  required  to  solve  the  problem 
to  6  digits  of  accuracy.  In  order  to  explore  the  limit  of  applicability  of  our  method 
we  tested  this  problem  with  different  parameter  settings. 

•  Using  the  standard  method,  the  Hessian  of  p  became  indefinite  in  the  8-th 
iteration  and  our  algorithm  failed. 

•  In  a  second  run  we  set  the  Hessian  of  the  concave  constraint  equal  zero  but 
kept  all  other  second-derivative  information.  The  method  converged  to  the 
true  solution  in  10  iterations  using  10  evaluations  of  the  Hessian  of  p  and  28 
Newton  steps. 

•  In  a  third  run  we  set  the  Hessians  of  all  constraints  to  zero  (simulating  a 
linearized  problem).  In  this  case,  the  Hessian  of  p  was  indefinite  to  begin  with 
(since  there  were  only  11  linearized  constraints  in  a  15  dimensional  space)  and 
the  method  failed  again. 

•  Finally  we  replaced  the  Hessians  of  all  constraints  by  multiples  of  the  identity 
(such  that  the  norm  of  the  replacement  approximately  equals  the  norm  of  the 
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true  Hessian).  This  time  the  method  took  15  iterations,  20  evaluations  of  the 
Hessian  of  <p  and  62  Newton  steps. 

5.4.  Conclusions 

The  design  of  fast  and  stable  implementations  of  interior-point  algorithms  is  marked 
by  a  number  of  conflicting  principles. 

•  It  is  desirable  to  maintain  some  polynomiality  results  that  limit  the  dependence 
of  the  method  on  the  data  of  a  particular  problem.  However,  most  polynomial¬ 
time  interior-point  methods  are  too  conservative  in  the  choice  of  the  steplength 
and  therefore  inefficient  in  practice. 

•  To  maintain  affine  invariance  of  the  method  seems  equally  important,  to  reduce 
the  influence  of  affine  transformations  of  the  problem.  With  finite-precision 
arithmetic  this  influence  however  cannot  be  eliminated  completely. 

•  Finally,  the  linear  systems  involved  should  be  kept  well-conditioned. 

A  typical  example  of  how  to  take  these  concepts  into  account  in  the  above  method 
is  the  steplength  along  the  tangent:  the  closer  the  extrapolation  to  the  boundary, 
the  more  ill-conditioned  the  Hessian  of  the  barrier  function  (and  the  worse  the  the¬ 
oretical  complexity),  suggesting  that  one  should  not  take  too  large  steplengths.  The 
concept  of  numerical  stability  based  on  the  condition  number  of  a  matrix  however  is 
not  perfect.  Not  only  do  interior-point  methods  for  some  reason  perform  well  when 
taking  steps  of  99.995%  to  the  boundary  (!!!)  [14],  but  there  are  also  simple  exam¬ 
ples  for  which  the  condition  number  of  a  matrix  is  (almost)  irrelevant;  for  example 
when  solving  equations  with  the  “ill-conditioned”  diagonal  matrix  diag(  10’°,  1 0— 10 ) . 

A.  DETAILS  FOR  THE  INITIALIZATION 

A.l.  Additional  Assumption 

In  the  footnote  preceding  the  definition  of  the  functions  /,(x,p)  (3.1)  we  have  re¬ 
ferred  to  Appendix  A  for  a  more  precise  statement  about  our  assumptions.  To 
guarantee  convexity  we  require  the  following  assumption. 

We  assume  that  the  functions  /,  are  convex  and  continuous  in  the  set 
Pi  and  smooth  in  its  interior.  That  is,  we  assume  that  P\  C  S. 

If  this  assumption  is  violated  we  can  try  to  change  the  given  values  of  J,  or  the 
starting  point  i°.  (Pi  depends  on  f),  and  x°.) 

A  slightly  more  complicated  modification  is  as  follows.  If  points  j*  are  known 
such  that  fi(x')  <  0,  we  may  consider  the  functions  /,(x,/r)  :=  ft(x  +  // ( j‘  -  x0)). 
In  this  case,  Lemma  1,  Part  1  and  2  no  longer  hold  but  the  sets  P(1  still  converge  to 
P  and  Part  3  and  4  still  hold.  We  will  not  discuss  this  modification  any  further  but 
concentrate  on  the  hopefully  more  common  case  (3.1). 
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A. 2.  Decreasing  the  Perturbation 

As  mentioned  in  Section  3.3  it  is  important  that  the  perturbation  w  describing  the 
perturbed  centers  not  be  too  large;  more  precisely  that  wTD2<f>i(x°)~lw  is  say  less 
than  1.  Only  for  small  vectors  w  is  it  possible  to  prove  polynomiality  for  linear 
constraints,  and  our  numerical  experiments  suggest  that  in  practice  also,  large  per¬ 
turbations  w  slow  down  convergence.  In  some  examples  however,  the  initialization 
outlined  in  this  paper  does  yield  large  perturbations  in,  and  it  is  necessary  to  reduce 
the  size  of  in  before  starting  the  algorithm.  For  such  cases,  our  implementation 
reduced  the  size  of  w  by  the  following  procedure. 

•  Before  starting  the  predictor-corrector  iterations  set  w  =  0  and  introduce 
additional  constraints  of  the  form  (x,—  x°)2  <  1012f2  (with  t  as  in  Section  3.1). 

•  Perform  a  number  of  Newton  steps  for  finding  the  (analytic)  center  of  P\  with 
the  additional  constraints,  and  stop  when  the  //-norm  ||Ax||// 
=  ( AxtH(x,  lJAx)1/2  of  the  Newton  step  Ax  satisfies  a  given  bound.  Let 
the  result  be  x°. 

•  Remove  the  additional  constraints  again  and  redefine  w  for  the  new  starting 
point  x°  as  outlined  in  Section  3.3. 

This  procedure  does  not  assume  that  a  bound  of  the  form  “||X  ~  *°||oo  <  106f  for  all 
feasible  x”  is  known  a  priori;  the  additional  constraints  are  only  used  for  decreasing 
w  to  guarantee  that  Newton’s  method  is  well  defined,  and  they  are  later  removed. 

A. 3.  Warm  Start 

If  an  initial  point  x°  is  given  that  is  “almost”  optimal,  a  “warm  start”  is  possible  by 
defining  the  quantity  t  (before  (3.1))  as  max{10-4,  f,(x0)}  for  example,  rather  than 
max{l,  /,(x0)},  and  by  fixing  p  >  1  to  minimize  the  norm  of  the  gradient  of  y?i(x°). 

We  point  out  a  possible  problem  that  may  occur  with  this  warm  start.  If  the 
initial  point  x°  is  feasible,  then  the  sets  P  and  PM  coincide — up  to  a  perturbation  of 
the  size  10~4.  In  contrast  to  many  interior-point  methods  for  linear  programs,  our 
method  requires  strict  fulfillment  throughout  of  all  inequalities  describing  the  set  Pu 
(since  outside  P^  the  functions  f,  may  not  be  defined  or  may  not  be  convex).  If  the 
set  P  is  “long  and  thin”  (e.g.  a  constraint  of  the  form  |xi|  <  10-5  that  bounds  the 
first  component  of  the  vector  x  makes  the  set  P  very  thin)  and  the  initial  point  x° 
is  feasible  but  far  from  the  optimum,  then  it  may  be  wise  to  define  (3,  as  outlined  in 
(3.1),  rather  than  using  a  warm  start  and  “winding  in  a  long  thin  neighborhood  of 
the  path  of  centers”  in  P  from  x°  to  the  optimal  solution7.  The  definition  in  (3.1) 
is  chosen  to  avoid  such  a  long  thin  path. 

7 A  convex  set  P  being  long  and  thin  is  no  problem  in  theory;  there  is  always  an  affine  mapping 
that  maps  P  into  a  “nearly  round  set”  (i.e.  into  a  set  that  contains  a  ball  of  radius  1  and  is  contained 
in  a  ball  of  radius  n,  the  dimension  of  the  space).  However,  the  affine  mapping  changes  the  condition 
numbers  of  the  Hessians.  Hence,  in  the  context  of  finite  precision,  the  concept  of  affine  invariance 
is  only  of  limited  relevance. 
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A. 4.  The  Multiplier  p 

In  the  definition  of  the  center  (3.5)  we  did  not  elaborate  on  the  choice  of  the  multi¬ 
plier  p  for  the  objective  function.  However,  a  good  choice  of  p  is  very  important.  For 
one-dimensional  examples  it  is  easy  to  see  that  a  poor  choice  can  result  in  the  curve 
of  centers  passing  the  same  point  x  twice  (which  is  not  attractive  when  following 
this  curve  numerically).  We  applied  the  following  heuristic  in  our  implementation. 

•  If  the  constraints  have  been  shifted,  i.e.  if  0  ^  0,  we  compute  a  tangent  tx  for 
the  case  p  =  0  and  a  tangent  t2  for  the  case  that  p  —  1  and  that  the  shift  is 
kept  unchanged  throughout  the  method.  We  then  choose  p  =  ||/2||/||<i  ||. 

•  If  the  constraints  have  not  been  shifted,  i.e.  if  (3  =  0,  then  we  choose  p  such 
that  the  H- norm  ||x'||//  of  the  tangent  x'  at  the  first  iteration  is  1. 

B.  SOME  RESULTS  ON  THE  SETS  PM 

Proof  of  Lemma  1: 

1.  Clear,  because  /?,•  >  0. 

2.  For  x  £  Pi\P  :  /,(x)  >  0.  Hence  /,(x)  -  pfi,  >  0  for  small  p  >  0;  i.e. 
x  g  P^  for  such  p. 

3.  Let  x1  €  P;  i.e.  /.(x1)  <0,  1  <  i  <  m.  Then  xx  +  p(x°  -  x1)  6  P°  because 

fi(xl  +  p(x°  -  X1),//)  =  fi(px°  +  (1  -  p)xx )  -  pi 3, 

<  p/«(x°)  +  (1  -  p)fi(x1)-  p{f,(x°)  +  t)  <  -pt  <  0. 

4.  Let  6  inf^^o  P^  /  0.  By  continuity  of  /,,  6  <  1.  If  p  >  6,  there  is  a 

p  e  [6,p)  for  which  P ^  is  not  empty.  Similarly  to  Part  3,  it  then  follows  with 

x1  £  Pjx  that  P°  ^  0.  (We  note  that  6  =  0  is  possible  if  Pi  is  unbounded.)  | 

Part  2  of  Lemma  1  also  holds  in  the  following  stronger  form. 

Lemma  3 

Let  Kr  :=  {x|  ||x|J2  <  r}  and  Pe  :=  {x|  x  =  y  +  x,  y  6  P ,  ||x||2  <  c}.  Then  for  all 

finite  r  and  all  positive  c  there  is  a  positive  p  such  that 

P^nKr  C  Pe, 

which  implies  that  converges  uniformly  to  P  in  any  ball  A'r.  The  restriction  to 
a  bounded  set  h'T  is  necessary,  since  there  are  examples  for  which  P„  Pi  for  any 
p  >  0. 

Proof:  We  prove  the  first  statement  by  contradiction.  Suppose  there  was  a  finite 
r  and  a  positive  c  such  that  for  all  p  >  0,  /'M  fl  Ar  ^  P(.  Let  pk  —  0.  pk  £  (0.1) 
be  a  sequence  and  xk  £  ( P(1*  0  A’r)\P, .  Since  ||xfc||  <  r  there  is  an  accumulation 
point  x.  Clearly  x  £  P  (otherwise  3?()  :  /i0(x)  >  0  and  then  by  continuity  of 
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3 6*cr  >  0  :  /,0(x  +  h's)  >  a ,  contradicting  the  definition  of  x).  By  construction  we 
also  know  that  x  £  Pf,  in  contradiction  to  PC  P°.  I 

That  the  more  general  statement  “P^  C  P<  for  small  enough  /x”  is  not  true 
can  be  seen  from  a  simple  counterexample.  Take  the  function  /(x,y)  :=  y2,'x  with 
domain  S  :=  {(x,y)|  x  >  1},  defining  P  :=  {(x,y)  £  S|  y2/x  <  0}.  It  is  easy  to 
verify  that  /  is  convex  and  that  PM  =  {(x,y)|  x  >  1,  \y\  <  v/jxx}  <£_  P]  for  any 
/X  >  0. 

Note  that  min{yj  (x,y)  6  P }  exists  in  this  case,  but  not  so  min{y|  (x,y)  €  PM) 
for  /x  >  0,  and  neither  does  a  perturbed  center  exist  for  0  <  /x  <  1.  (This  example 
shows  another  surprising  property.  If  /i  and  f?  are  convex  functions  that  each  have 
a  minimum  on  a  common  closed  set.  5,  then  }\  +  fo  may  not  have  a  minimum  on 
5;  e.g.  take  y1  jx  and  (y  -  1  )2/x  on  the  above  set  S.)  I 

Despite  this  counterexample,  the  vague  intuition  that  P\  might  not  be  much 
"bigger”  than  P  can  be  formalized  in  the  following  simple  lemma. 

Lemma  4 

If  Pj,  is  nonempty  and  bounded  for  some  6  £  [0, 1),  then  so  is  P\. 

Proof:  (by  contradiction.)  Without  loss  of  generality  let  6  =  0  and  x1  =  0  £  P. 
(Then  also  0  6  Pi-)  Suppose  now  that  Pi  is  unbounded.  By  convexity  of  P]  there 
is  a  vector  s  such  that  As  6  Pi  for  all  A  >  0.  Boundedness  of  P  =  P0  implies  that 
there  exists  M  <  oo  such  that  As  ^  P  if  A  >  M,  i.e.  9x'o  :  fiQ(Ms)  >  0.  We  conclude 
that  b  :=  (fi0(Ms)  -  /lo(0))/Af  >  0,  and  hence  for  A  >  M  it  follows  from  convexity 
of  fi0  that 

f>0(\s)  >  fio( Ms)  +  -  M)6  oc 

as  A  —  oo.  This  implies  that  /I0(As)  -  0,  >  0  for  sufficiently  large  A,  which  in  turn 
implies  that  As  ^  Pi  for  such  A,  in  contradiction  to  the  choice  of  s.  I 

C.  CONVERGENCE  OF  THE  CENTERS 

We  are  now  ready  to  show  briefly  the  results  stated  in  Lemma  2,  at  least  for  cer¬ 
tain  cases.  The  results  hold  under  more  general  conditions,  but  the  most  general 
conditions  are  not  of  concern  in  the  present  paper. 

Proof  of  Lemma  2 

Throughout  this  lemma  we  assume  that  Pj  is  bounded  (which  is  the  case,  for  ex¬ 
ample,  if  P  is  nonempty  and  bounded),  and  that  <p\  is  strictly  convex  (as  for  linear 
or  quadratic  or  self-concordant  programs). 

First  we  note  that  a  perturbed  center  x(/x)  exists  for  all  /x  for  which  P°  is  not 
empty,  since  -  5Z^i  ln(-/,(x,/x))  — *  oo  as  x  approaches  the  boundary  of  PM.  If  y?i 
is  strictly  convex  then  so  is  (as  long  as  P°  ^  0),  and  the  perturbed  center  is 
unique. 

The  second  part  of  Lemma  2  follows  from  Lemma  3  since  x(/x)  €  P^  C  Pi  and 
Pi  is  bounded,  i.e.  P]  C  A’r  for  some  r. 
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To  prove  the  third  part  we  note  that  when  multiplying  (3.5)  by  ft  we  obtain 

m 

x(ft)  =  arg  min  fiip^x)  =  arg  min  fo(x)  ~  M  ln( -/.C1^))  -  ftwTx. 

xePn 

Let  :=  x*  +  n(x°  -  x*).  We  conclude  as  in  the  proof  of  Lemma  1  Part  3  that 
f^x^^ft)  <  —fit  for  all  i.  This  implies  that 

m 

-ft  ln(~  ~  pwTx^  <  -nifi  In  (fit)  -  ftwTx M  —  0 

i=i 

as  ft  -+  0.  (Note  that  wTx^  is  bounded  for  ft  G  [0,1].)  By  continuity  of  /0  it  also 
follows  that  fo(x M)  — ♦  /o(x*),  so  that  lim supM_0 fup^x^)  <  fo(x*).  From  Part  2 
above  and  continuity  of  /o  it  follows  that  lim  inf^0  fo(x(p))  ^  fo(x*)-  Further, 
P9m( x(fi))  <  fiip^x^),  and  since  the  logarithmic  barrier  terms  -  ln(-/,(x(p),/t)) 
and  wTx(ft)  are  bounded  below  for  ft  G  [0, 1],  the  claim  follows  from  the  above 
inequalities.  I 
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